2,838 research outputs found

    From HIV protein sequences to viral fitness landscapes: a new paradigm for in silico vaccine design

    Get PDF
    Background: An inexpensive prophylactic vaccine offers the best hope to curb the HIV/AIDS epidemic gripping sub-Saharan Africa. Systematic means to guide the design of an effective immunogen for this, and other, infectious diseases are not available. What is required is a method to chart the peaks and valleys of viral fitness as a function of amino acid sequence. An efficacious vaccine would eject the virus from the high fitness peaks, and drive it into the valleys where its compromised fitness impairs its ability to replicate and inflict damage to the host. Methods: Appealing to spin glass models in statistical physics, we present a novel approach to translate viral sequence databases into landscapes of viral fitness. These inferred models furnish a quantitative description of viral replicative capacity as a function of amino acid sequence. We illustrate this approach in the development of landscapes for the proteins of HIV-1 clade B Gag. Results: In comparisons to experimental and clinical data, our inferred landscapes demonstrate excellent agreement with: 1) in vitro replicative fitness measurements, 2) clinically observed high-fitness circulating viral strains, 3) documented HLA associated CTL escape mutations, and 4) intra-host temporal adaptation pathways revealed by deep sequencing. These favorable comparisons support our landscapes as reflections of intrinsic viral fitness. We illustrate the value of such descriptions in the computational design of a CTL Gag immunogen. Conclusion: We present a novel methodology to translate viral sequence data into quantitative landscapes of viral fitness. In an application to HIV-1 Gag, we illustrate excellent agreement of our model predictions with experimental and clinical data, and demonstrate a powerful new approach for HIV immunogen design. We anticipate that this approach may represent a heretofore unprecedented means to synthesize fitness landscapes for diverse pathogens, and may provide the basis for the design of improved prophylactic and therapeutic strategies

    Statistically optimal continuous free energy surfaces from biased simulations and multistate reweighting

    Full text link
    Free energies as a function of a selected set of collective variables are commonly computed in molecular simulation and of significant value in understanding and engineering molecular behavior. These free energy surfaces are most commonly estimated using variants of histogramming techniques, but such approaches obscure two important facets of these functions. First, the empirical observations along the collective variable are defined by an ensemble of discrete observations and the coarsening of these observations into a histogram bins incurs unnecessary loss of information. Second, the free energy surface is itself almost always a continuous function, and its representation by a histogram introduces inherent approximations due to the discretization. In this study, we relate the observed discrete observations from biased simulations to the inferred underlying continuous probability distribution over the collective variables and derive histogram-free techniques for estimating this free energy surface. We reformulate free energy surface estimation as minimization of a Kullback-Leibler divergence between a continuous trial function and the discrete empirical distribution and show that this is equivalent to likelihood maximization of a trial function given a set of sampled data. We then present a fully Bayesian treatment of this formalism, which enables the incorporation of powerful Bayesian tools such as the inclusion of regularizing priors, uncertainty quantification, and model selection techniques. We demonstrate this new formalism in the analysis of umbrella sampling simulations for the χ\chi torsion of a valine sidechain in the L99A mutant of T4 lysozyme with benzene bound in the cavity.Comment: 24 pages, 5 figure

    Nonlinear Machine Learning and Design of Reconfigurable Digital Colloids

    Get PDF
    Digital colloids, a cluster of freely rotating “halo particles tethered to the surface of a central particle, were recently proposed as ultra-high density memory elements for information storage. Rational design of these digital colloids for memory storage applications requires a quantitative understanding of the thermodynamic and kinetic stability of the configurational states within which information is stored. We apply nonlinear machine learning to Brownian dynamics simulations of these digital colloids to extract the low-dimensional intrinsic manifold governing digital colloid morphology, thermodynamics, and kinetics. By modulating the relative size ratio between halo particles and central particles, we investigate the size-dependent configurational stability and transition kinetics for the 2-state tetrahedral (N=4) and 30-state octahedral (N=6) digital colloids. We demonstrate the use of this framework to guide the rational design of a memory storage element to hold a block of text that trades off the competing design criteria of memory addressability and volatility

    Machine learning assembly landscapes from particle tracking data

    Get PDF
    Bottom-up self-assembly offers a powerful route for the fabrication of novel structural and functional materials. Rational engineering of self-assembling systems requires understanding of the accessible aggregation states and the structural assembly pathways. In this work, we apply nonlinear machine learning to experimental particle tracking data to infer low-dimensional assembly landscapes mapping the morphology, stability, and assembly pathways of accessible aggregates as a function of experimental conditions. To the best of our knowledge, this represents the first time that collective order parameters and assembly landscapes have been inferred directly from experimental data. We apply this technique to the nonequilibrium self-assembly of metallodielectric Janus colloids in an oscillating electric field, and quantify the impact of field strength, oscillation frequency, and salt concentration on the dominant assembly pathways and terminal aggregates. This combined computational and experimental framework furnishes new understanding of self-assembling systems, and quantitatively informs rational engineering of experimental conditions to drive assembly along desired aggregation pathways. © 2015 The Royal Society of Chemistryope

    DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of C{\alpha} Protein Traces

    Full text link
    Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package
    corecore